Incorporating Signal Awareness in Source Code Modeling: an Application to Vulnerability Detection
نویسندگان
چکیده
AI models of code have made significant progress over the last few years. However, many are actually not learning task-relevant source features. Instead, they often fit non-relevant but correlated data, leading to a lack robustness and generalizability, limiting subsequent practical use such models. In this work, we focus on improving model quality through signal awareness , i.e., relevant signals in input for making predictions. We do so by leveraging heterogeneity samples terms their signal-to-noise content. perform an end-to-end exploration awareness, comprising: (i) uncovering reliance task-irrelevant signals, via prediction-preserving minimization, (ii) models’ incorporating notion complexity during training, curriculum learning, (iii) generating simplified signal-preserving programs augmenting them training dataset, (iv) presenting novel interpretation behavior from perspective using its distribution. propose new metric measure awareness- Signal-Aware Recall, which captures how much model’s performance is attributable learning. Using software vulnerability detection use-case, our probing approach uncovers models, across three different neural network architectures datasets. Recall observed be sub-50s with traditional high 90s, suggesting that presumably picking up lot noise or dataset nuances while logic. With code-complexity-aware enhancement techniques able assist towards more recording up-to 4.8x improvement awareness. Finally, employ introspection uncover aspects where facing difficulty, analyze alleviate it.
منابع مشابه
An application of signal detection theory with finite mixture distributions to source discrimination.
A mixture extension of signal detection theory is applied to source discrimination. The basic idea of the approach is that only a portion of the sources (say A or B) of items to be discriminated is encoded or attended to during the study period. As a result, in addition to 2 underlying probability distributions associated with the 2 sources, there is a 3rd distribution that represents items for...
متن کاملGPS M’-Code and P-Code Signal Simulation Using an Open Source Radio Platform
Current generation military simulators are expensive, increasing costs for GPS test and evaluation for military GPS user equipment developers. Inexpensive commercial GPS simulators are available that use signal simulation, record and playback techniques to test receivers under representative environment, but these commercial record and playback GPS simulators do not have the bandwidth needed to...
متن کاملAC: An Integrated Source Code Plagiarism Detection Environment
Plagiarism detection in programming assignments is still a very problematic issue, in terms of economic costs, conceptual controversy, legal risks, and detection algorithms and heuristics. In this paper, we present AC: an integrated environment for the study of plagiarism and a powerful tool for its detection. We explain the special design of AC, prepared for unlimited improvement and external ...
متن کاملan application of equilibrium model for crude oil tanker ships insurance futures in iran
با توجه به تحریم های بین المملی علیه صنعت بیمه ایران امکان استفاده از بازارهای بین المملی بیمه ای برای نفتکش های ایرانی وجود ندارد. از طرفی از آنجایی که یکی از نوآوری های اخیر استفاده از بازارهای مالی به منظور ریسک های فاجعه آمیز می باشد. از اینرو در این پایان نامه سعی شده است با استفاده از این نوآوری ها با طراحی اوراق اختیارات راهی نو جهت بیمه گردن نفت کش های ایرانی ارائه نمود. از آنجایی که بر...
ذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Software Engineering and Methodology
سال: 2023
ISSN: ['1049-331X', '1557-7392']
DOI: https://doi.org/10.1145/3597202